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Abstract 

The encoder and decoder for lossy data compression of binary memoryless sources are developed 
on the basis of a specific-type nonmonotonic perceptron. Statistical mechanical analysis indicates 
that the potential ability of the perceptron-based code saturates the theoretically achievable limit 
in most cases although exactly performing the compression is computationally difficult. To resolve 
this difficulty, we provide a computationally tractable approximation algorithm using belief prop- 
agation (BP), which is a current standard algorithm of probabilistic inference. Introducing several 
approximations and heuristics, the BP-based algorithm exhibits performance that is close to the 
achievable limit in a practical time scale in optimal cases. 
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I. INTRODUCTION 



Lossy data compression is a core technology of contemporary broadband communica- 
tion. The best achievable performance of lossy compression is theoretically provided by 
rate-distortion theorems, which were first proved by Shannon for memoryless sources 
Unfortunately, Shannon's proofs are not constructive and suggest few clues for how to de- 
sign practical codes. Consequently, no practical schemes saturating the potentially optimal 
performance of lossy compression represented by the rate-distortion function (RDF) have 
been found yet, even for simple information sources. Therefore, the quest for better lossy 
compression codes remains a major problem in the field of information theory (IT). 

Recent research on error correcting codes has revealed a similarity between IT and sta- 
tistical mechanics (SM) of disordered systems j^J. Because it has been shown that methods 
from SM can be useful to analyse subjects of IT, it is natural to expect that a similar 
approach might also bring about novel developments in lossy compression. 

This research is promoted by such a motivation. Specifically, we propose a simple com- 
pression code for uniformly biased binary data devised on input-output relations of a per- 
ceptron. Theoretical evaluation based on the replica method (RM) indicates that this code 
potentially saturates the RDF in most cases although exactly performing the compression is 
computationally difficult. To resolve this difficulty, we develop a computationally tractable 
algorithm based on belief propagation (BP) which offers performance that approaches 
the RDF in a practical time scale when optimally tuned. 



II. LOSSY DATA COMPRESSION 



We describe a general scenario for lossy data compression of memoryless sources. Original 
data is denoted as y = (y 1 ,?/ 2 , . . . ,y M ), which is assumed to comprise a sequence of M 
discrete or continuous random variables that are generated independently from an identical 
stationary distribution p(y). The purpose of lossy compression is to compress y into a binary 
expression s = (si, s%, . . . , sn) (sj G {+1,-1}), allowing a certain amount of distortion 
between the original data y and its representative vector y = (y l ,y 2 , . . . ,y M ) when y is 
retrieved from s. 

In this study, distortion is measured using a distortion function that is assumed to be 
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defined in a component-wise manner as T>(y,y) = 5^ M=1 c^lA where d(y' M ,y fJ ') > 0. A 
code C is specified by a map y(s; C) : s — ► y, which is used in the restoration phase. This 
map also reasonably determines the compression phase as 

s(y; C) = argmin{2%, y(s; C))}, (1) 
s 

where argmin s {- • • } represents the argument s that minimises ■ • • . When C is generated 
from a certain code ensemble, typical codes satisfy the fidelity criterion 

^mm{P(y,y( S ;C)} = ^mjn|^d(^,^( S ;C))|< D, (2) 

for a given permissible distortion D and typical original data y with probability 1 in the 
limit M, N — > oo maintaining the coding rate R = N/M constant, if and only if R is larger 
than a certain critical rate R C {D) that is termed the rate- distortion function. 

However, for finite M and N, any code has a finite probability Pp of breaking the fidelity 
(J2J), even for i? > R C (D). Similarly, for < R C (D), Eq. (J2J) is satisfied with a certain proba- 
bility P$. For reasonable code ensembles, the averages of these probabilities are expected to 
decay exponentially with respect to M when the data length M is sufficiently large. There- 
fore, the two error exponents &a{D, R) = limjvf^oo — (1/M) In (P F ) C for R > R C (D) and 
aB{D,R) = liniAf^oo — 0-/M) In (Ps) c for R < R C (D), where (• • -)c represents the average 
over the code ensemble, can be used to characterise the potential ability of the ensemble of 
finite data lengths. 

III. COMPRESSION BY PERCEPTRON AND THEORETICAL EVALUATION 

It is conjectured that the components of s are preferably unbiased and uncorrelated in 
order to minimise loss of information in the original data from a binary information source, 
which implies that the entropy per bit in s must be maximised. On the other hand, in 
order to reduce the distortion, the representative vector y(s; C) should be placed close to 
the typical sequences of the original data that are biased. Unfortunately, it is difficult to 
construct a code that satisfies these two requirements using only linear transformations 
over the Boolean field because a linear transformation generally reduces statistical bias in 
sequences, which implies that one cannot produce a biased representative vector from the 
unbiased compressed sequence. 
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One method to design a code that has the above properties is to introduce a nonlinear 
transformation. A perceptron provides a simple scheme for carrying out this task. To 
specify lossy data compression codes for binary original data y G {+1, — 1} M generated 
from a memoryless source, we define a map by utilising perceptrons from the compressed 
expression s G {+1, — 1}^ to the representative sequence y(s;C) G {+1, — 1} M as 

r(s; {«"}) = / Q= E < s ^j > (A* = 1, 2, .... AO (3) 

where /(■) is a function for which the output is limited to {+1, —1} and x fJ,=1,2, '" ,M are 
randomly predetermined N- dimensional vectors that are generated from an N- dimensional 
normal distribution P(x) = (v^r) N exp [— |a;| 2 /2]. These vectors are known to the encoder 
and decoder. We adopt an output function fk(u) = 1 for \u\ < k, and —1 otherwise, which 
eventually offers optimal performance. 

We measure the distortion by the Hamming distance T>(y, y(s; {x 11 })) = 
Yup=i jl — ' fk x i s i/V~Nj | /2 . Then, the compression phase for the given 

data y can be defined as finding a vector s that minimises the resulting distortion 
V(y,y(s;{x t± })), and the retrieval process can be performed easily using Eq. (j2J) from 
a given sequence s. The performance evaluation has been investigated theoretically from 
the perspective of SM, which is not specialised for this perceptron-based code. Rather, it is 
a general one as mentioned briefly below. 

Let us regard the distortion function T>(y, y(s; C)) as the Hamiltonian for the dynamical 
variable s, which also depends on predetermined variables y and C. The resulting distortion 
(per bit) for a given y and C is represented as X(y, C) = mm. s {M~ l V{y , y(s; C))}. We start 
with a statistical mechanical inequality 

e -Mf3X(y,c) < ^ e - m y,y { s;c)) = y c) = e -M P my,c)^ (4) 
s 

which holds for any sets of (3 > 0, y and C. The physical implication of this is that the ground 
state energy X(y,C) (per component) is lower bounded by the free energy f(/3;y,C) (per 
component) for an arbitrary temperature (3~ l > 0. In particular, the free energy /(/?; y,C) 
agrees with X(y,C) in the zero temperature limit (3 —>■ oo, which is the key for the analysis. 
The distribution of the free energy P(f] (3) is expected to peak at its typical value of 

fM = ~ On Z(f3; y, C)) yfi = lim In <Z»(/?; y, C)) y>c , (5) 



where {• • - )y C denotes the average over y and C, and decays exponentially away from ft{/3) 
as P(/;/3) ~ exp[— Mc(f, 0)] for large M. Here, we assume that c(f,/3) > is a convex 
downward function that is minimised to at / = /$(/?). This formulation implies that, 
for Vn G R, the logarithm of the moment of the partition function Z(/3;y,C), g(n,/3) 
\ I 1 In (Z n (j3; y, C))y c , can be evaluated by the saddle point method as 

g(n,[3)=min{nPf + c(f,[3)}. (6) 

Based on the Legendre transformation (JBJ), c(f,/3) is assessed by the inverse transformation 

c(f,/3)=max{-nl3f + g(Ti,P)}, (7) 

n 

from g(n,/3), which can be evaluated using the RM analytically extending expressions ob- 
tained for n G N to n G R. 

The above argument indicates that the typical value of the distortion averaged over the 
generation with respect to y and C can be evaluated as 

<^.C)>,, C =>|!nJimI« (8) 

and that the average error exponent a{A,B}[D, R), which is an abbreviation denoting 
a A (D,R) and (Xb(D,R), can be assessed as 

a {AB} (D,R)= \im c(f = D,P)= lim l- n dg ^^ +g(n,0)\, (9) 

where n is a function of (3 that is determined by the extremum condition of Eq. (JJJ) as 
(5~ x dg{n : (3)/dn = D. Equations ((HI) and constitute the basis of our approach. 

When the above general framework was applied to the random code ensemble, which is not 



a practical coding scheme, but can exhibit optimal per 



ormance, the theoretical limitations - 



the RDF and optimal error exponents derived in IT (4 , 0] were reproduced correctly 0, Q| ■ 
These results support the validity of our theoretical framework. In addition to consistency 
with the existing results, we demonstrated the wide applicability of our framework for the 
perceptron-based code in [?} Isj], which indicated that the perceptron-based code can also 
saturate the theoretical limitations in most cases. 

IV. ALGORITHM BASED ON BELIEF PROPAGATION 

Calculation using the RM implies that the perceptron-based code potentially provides 
optimal performance for binary memoryless sources. However, this is insufficient when it is 



5 



necessary to obtain a compressed sequence s for a given finite length of original data y. 

For the perceptron-based code, the compression phase to follow the prescription (JTJ) 
is computationally difficult because it requires a comparison over 0{2 N ) patterns to 
extract the ground state for the relevant Boltzmann distribution P^(s\y, {a^}; (3) = 
exp [— (3T>(y, y(s; {x^}))] /Z(j3; y, {x^}). The Boltzmann factor is rewritten here for the sake 
of subsequent expressions as 



where we define Ef. )y n(z) = exp[—(/3/2){l—y^-fk (z)}]. We require computationally tractable 
algorithms that generate a probable state s from the Boltzmann distribution. 

The BP is known as a promising approach for such tasks. It is an iterative algo- 
rithm that efficiently calculates the marginal posterior probabilities P S[ (si\y, {x^}; (3) = 



in Eq. ()10|). In general, the fixed point of this algorithm generally provides the solution of 
the Bethe approximation [9J known in SM. 

To introduce this algorithm to the current system, let us graphically describe this factor- 
ization, denoting the predetermined variables (y M and a^) and compressed sequence (sj) by 
two kinds of nodes, then connecting them by an edge when they are included in a common 
factor, which can be expressed as a complete bipartite graph shown in FIG. 

On that graph, BP can be represented as an algorithm that passes messages between the 
two kinds of nodes through edges as 



M 

p%{si) = n (12) 

where t — 1, 2, ... is an index for counting the number of updates. The marginalised posterior 



Since the computational cost for the summation in Eq. (fTTj) grows exponentially in 
N, it is extremely difficult to perform this algorithm exactly. However, because x^ are 




(10) 



12s ■ t Pb( s \v, {x 11 }] P 1 ) based on the property that the potential function is factored as shown 




(11) 
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FIG. 1: Graphical representation of the dependency of variables for the perceptron-based code. 
Each pair of predetermined variables (y^, x^) is related with every bit of the compressed sequence. 

generated independently from P(x) = (v^r) ^exp [— |a?| 2 /2], this summation can be well 
approximated by a one- dimensional integral of a Gaussian distribution 1Q\, the centre and 

the variance of which are AJ, = (j2t# %i m U) /V^> 1_ 9ji> where 6 = {X^J ( m U) 2 } / N > 
respectively. This approximation makes it possible to carry out the belief updates (fTTJl and 
(fT2"j) in a practical time scale, providing a set of self-consistent equations 

^ = a jfl*M A * + v^') (13) 

= tanh ( J~] tanh" 1 m£, ] , (14) 




where we define Dz = (dz/\/2n) exp(— z 2 /2), and f'(x) = df(x)/dx for any function f(x). 
Employing these variables, the approximated posterior average of Sj at the ith update can 
be computed as m\ = tanh f X/^=i tanh -1 m^j « tanh f X^i=i ■ 

The number of variables can be further reduced to O(M) when M, N is large by employing 
Eq. (JT4*j) [10| . One can approximately transform Eq. ()14j) into ~ m\ — |l — (m^) 2 j m^. 
Utilising this equation and taking into consideration that the influence of each element in 
the sum is sufficiently small compared to the remains, the following approximations hold. 

AT TV 

A U - ^E^-^E^^-K) 2 }^-^^ ( 15 ) 

* i=l * i=l 



I ^ 

^EK) 2 ^' as) 



N 

i=i 



Because the last term in Eq. ()15|) is infinitesimal, the Taylor expansion is applicable to Eq. 
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(JT3J), providing 



ni 



,t+i 
/// 



7^ ^ W) 

/ Zte S*,^ (T*) 5 fei ^ (T/ t ) ^ / D 2 5*,^ (2*) 
1 jv > 




where we define = (^2f =1 xfrnj^ / y/N. It is important to notice that the second term of 
the right-hand side in Eq. (|17jh which can be negligible compared to the first term, becomes 
an influential element when the posterior average m\ is calculated. 



Using the new notations a* and G*, the compression algorithm is finally expressed as 



,t+i 



G f 



K) 



S oz s h9 „ (m) ' 



M 

E 



t 



fDzE ktV ,(U*) [fDzZ^iUfi 
([/^^-(l-gX+V^ 
tanh 



(18) 
(19) 



M 



G 



t-i 



N 1 



t-i 



(20) 



For the perceptron-based code, we can calculate 



J Dz E k ^{U l 



+ (1 



0) {y>H(wU - y»H(wl + ) - (^ - l)/2} 



J Dz ~l y »(Ul 



2tt(1 - q*) 
1 - e'^yf 1 



cxp 



cxp 



where 



w 



V2tt(1 - q l 
±k - Al + (1 - g*)a* 



w„_ exp 



K-) : 



exp 



/ z 
exp -TT 



K + ) ; 



(21) 
(22) 

,(23) 



Jx v2n \ 2 

The exact solution of (V/) trivially vanishes because of the mirror symmetry 
^k,yv{Ujj) = Sk t yn(— £/*). This fact implies that one cannot determine the more proba- 
ble sign of Si even if the update iteration is successful. A similar phenomenon was also 



reported in codes of another type [ill . Il2j . To resolve this problem, we heuristically intro- 



duce an inertia term that has been employed for lossy compression of an unbiased source 




average bit error rate 



FIG. 2: Compression performance of BP for p = 0.2,0.5 and 0.8. In the region of the low com- 
pression rate, the performance approaches the RDF in the case of p = 0.5 and 0.8. 

in Eq. PU|) as m\ = tanh Y^Li x t a U^ ~ G^m^/N + tanh" 1 (7m") , where 
7 is a constant (0 < 7 < 1). 

Experimental results are shown in FIG. |2 together with the RDFs. Given original data 
generated from a binary stationary distribution P(y' 1 = +1) = 1 — P(y tI = — 1) = p and vector 
{x^}, we compressed the original data into a shorter sequence using the BP. The final value 
of si was determined as si = sgn[m*]. The values of k and (3 were set to theoretically optimal 
values evaluated from the RM-based analysis; the value of 7 was determined by trial and 
error. In the figure, the bit error rates averaged over 100 runs are plotted as a function 
of the compression rate R for bias p = 0.2, 0.5 and 0.8. While the compressed sequence 
was fixed to N = 1000 bits (N = 500 when R < 0.2), the length of the original data was 
adjusted in accordance with the compression rate. We stopped the iteration at the 35th 
update and determined the compressed sequence from the result at that time, even if the 
algorithm did not converge. As the compression rate becomes smaller, the performance 
approaches the RDF in the case of p = 0.5 and 0.8. In particular, the performance for 
p = 0.5 j R < 0.4 is superior to results reported in the IT literature as a binary memoryless 
source [13j. However, the results for p = 0.2 yield poor performance compared to those for 
p = 0.8, even though the situation from the perspective of information is the same as p = 0.8, 
which might be the result of asymmetric influences of input-output relations between those 
two cases. Improvement of this behaviour is a subject of future work. 
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V. SUMMARY 



We have investigated the performance of lossy data compression for uniformly biased 
binary data. Analyses based on the RM indicate the great potential of the perceptron-based 
code, which is also partially confirmed by a practically tractable algorithm based on BP. A 
close relationship between the macroscopic dynamics of BP and the replica analysis, was 
recently reported Q]. Investigation in such a direction in the current case is under way. 
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